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Abstract 

A method for estimating power scores is described. By way of 
illustration, it is applied to 21 students who were improperly timed on 
a standard test. Some empirical results are given in support of the 
estimation procedure. 
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Power Scores Estimated by Item Characteristic Curves^ 

Frederic M- Lord 
Educational Testing Service 

1. Introduction 

The effect of limiting testing time has been investigated in the past 
by "vrarious empirical studies, set up to treat testing time as a variable 
manipulated by the e:q)erimenter. In many practical situations, time is 
limited and cannot be manipulated, yet it would be desirable to estimate 
what the test scores would have been if enough time had been allowed for all 
examinees to finish. 

This might be done easily if test questions were all of equal difficulty 
and of equal discriminating power. In actual testing situations, however, 
item characteristic curve theory (Lord & Novick, I 968 , chapts. l6-20) is 
required. • 

The present report concerns a situation, not unparalleled, in which a 
group of 21 students was tested on a standard verbal aptitude test under a 
time limit considerably shorter than should have been allowed. This report' 
describes a tryout of a method for estimating the "power" scores that would 
have been obtained if the students had had enough time to finish. 

In addition to the usual assumptions of item characteristic curve theory, 
the method assumes that the students answer the test questions in order. 

Also, that the students respond as they v/ould if given unlimited time — if 
given more time they would not go back and change answers already given. 

Such assumptions probably hold approximately for most students, but not 
all. For this reason, the method outlined here may be of theoretical more 
than of practical interest. 
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2. Method 

The item characteristic function represents the probability 

that examinee a will answer test question (item) i cor- 
rectly, where 0 is the "ability" level of examinee a on whatever 
psychological dimension is measured by the test. If the test score x is 
the number of right answers, the expected power score for examinee a on 
n questions is given by 

St 




n 

a 

= z 

i=l 




Thus, the power score of examinee a on any n questions can be estimated 
provided his ability 0 and the ftinctions P.(0) can themselves be esti- 
mated from available data. . 

In practice, some specified mathematical form depending on three item 
parameters, descriptive of item i , is assumed for P^(©) , i - 1,2,. ..,n 
An essential feature of item characteristic curve theory is that the item 
parameters remain the same, regardless (within reasonable limits) of the 
group of examinees. Also, the examinee parameters (the 0 ) remain the 
same, regardless of the test items administered so long as all items are 
measuring the same psychological dimension. 

The item parameters for each of the n = 90 verbal aptitude items 
composing the mistimed test under consideration were estimated from the 
answer sheets of a convenient group of 994 students, including the 21 
mistimed students. The ability parameter of each of the 21 mistimed 
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studeriv/S v/as estimated f*i*om his responses, if^norinfj any unansvrered items 

at the end of the test. The number-right pov/er score of each mistimed 

examinee v/as then estimated from his score on the n items actually 

n ^ ^ 

ansv/ered and from . H P. (0 ) , a sum of his estimated P. comnuted 

i=n-.-i'i 1 a la “ 

A, ® 

from 0^ (his estimated 0 ) and from the estimated item parameters. 

p. Px-eliminary Empirical Checks 

Granted the two assumptions listed at the end of the first section, 
the procedure described above can be justified enpirically, without con- 
sidering the assumptions of item characteristic curve theory, if v/e can 
shov/ three things: 

1. Estimates of item parameters fromone group of examinees closely 
approximate estimates of the same item parameters from other 
groups of examinees. 

2. Estimates of ability parameters from part of a test closely 

approxinate estimates obtained from the entire test. 

5* The povrer score of an examinee on a test can be accurately 

/\ 

approximated from his 0^ , as estimated from the same test. 

A very v/ide variety of empirical checks v/ill have to be carried out 
before we can with any assurance outline the circumstances under v;hibh 
these three statements hold. Scattered evidence so far is favorable, as 
suggested below. 

1. Lord ( 1970 ) shows good agreement between estimates of item 
characteristic curves obtained from tv/o rather different 
groups of examinees, (it is of interest that the tv;o sets 
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of curves were obtained by tv;o different methods, one of 
v/hich does not involve any of the usual assutrptions of item 
characteristic curve theory.) ; 

2. Ability parameters for 2,857 examinees were estimated from 
their responses to items 4l~90 of a 90 ~item verbal aptitude 
test (a parallel form to the test that was mistimed). The 
same parameters were reestimated from the saihe ans^/er 
sheets, this time from items 1-29, ^1-75* After replacing 
all 0 < -5 by 0 = -5 (for 59 values of 0 ), the pro- 
duct moment correlation between the two sets of estimates v/a 
found to be 0.9^^. This value should be compared to the cor 
relation of 0.957 from the same answer sheets between number 
right score on items 41-90 and number-right score on items 
1-29, -'!l-75v (The correlations considered in this paragraph 
are high because much of the same data is used to determine 
both of the variables correlated. This does not invalidate 
the present line of reasoning, since a similar overlap is 
involved v/hen we estimate the 0 of a mistimed individual 
for the whole test from his performance on items to which 
he responded. ) 
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n 

^ _ a 

The product moment correlation between S P. (© ) and 

v . i=l ^ ^ 

number-right score on a 60 -item arithmetic reasoning test 

v/as found to be O.98I for 2,946 examinees, v/here 11 is the 

a 

last item answered by examinee a . On a 90-item verbal 
aptitude test (parallel to the mistimed test), the cor- 
relation was found to be O.992 for 2,926 examinees. The 
scatter plot for each of these shows a highly linear 
relationship, with the points grouped closely around a 
straight line going through the origin. The plot for 
the 90 -item verbal test is shown in Figure 1. 

These last results support the conclusion that number-right scores 
for a set of data can be reproduced with high accuracy from item parameters 
and ability parameters estimated from the same data. Results cited 
earlier support the conclusion that, when necessary, the parameters for 
these data can be approximated by estimates obtained ftom other sets of 
data. Thus, power scores can be estimated for examinees who do not have 
time to finish the test. 

Under the mathematical model used, the reason individual points in 
Figure 1 fail to fall along a perfectly straight line is that some examinees 
are lucky and some are unlucky in ansv/ering the particular n questions 
administered. The model is a probabilistic one, encompassing these chance 
fluctuations. The fact that parameter estinates have been used for pre- 
dicting number-right score instead of true values tends to decrease the 
scatter of points, since the estimates vrere made to fit the data. 
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4. Estimated Power Scores 



Table 1 shows for each of the 21 mistimed students his number-right 

score X , his last item answered n , his estimated ability level 0 , 

9. 9 fl 

and his estinated power score 



X 4 
9 



90 . 

E P.(0 ) 

1 ■ 9 ' 



i=n +1 

9 



/ 

/ 



/ 



on the verbal aptitude test. The 21 student^ are arranged in order of 
estimated ability. / 

Since 5 of the students completed the entire test in spite of the 
shortened testing time, the estimated power scores are of interest only 
for the remaining l6 students. No enpirical check is available for these 
l6 estimated power scores. The estimates should be valid, however, as 
long as the students would not use additional testing time to change 
responses thej' have already given or to answer questions they have 
omitted. 

The main evidence supporting this last assertion is provided by 

results such as appear in Figure 1. The predictions for the l6 students 

should be as accurate as the predictions in Figure 1, except for the fact 
^ - 

that the 0 for these 16 students were estimated from only n of the 

9 

90 test items. 
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Table 1 

Estimated Power Scores for Mistimed Students 



Number- 

right 

score 


Last 

item 

answered 


Estimated 

ability 


Estimated 

power 

score 


68 


90 


1.51 


68 


67 


90 


1.51 


67 


52 


85 


' .94 


55*6 


47 


72 


.95 


59-8 


55 


90 


CD 

VO 


55 


55 


90 


.69 


55 


48 


81 


.65 


55.7 


51 


88 


.60 


52.4 


45 


82 


.47 


49.8 


44 


85 


•41 


47.8 


^5 


85 


.20 


45.7 


39 


88 


-.05 


40.2 


57 


81 


-.07 


41.4 


58 


82 


-.16 


41.8 


39 


87 


-39 


40.2 


32 


79 


-.41 


36.9 


32 


• 90 


-.42 


32 


31 


84 


-.76 


33.2 


29 


85 


-.78 


30.8 


23 


85 


-1.36 


24.4 


24 


87 


- 1.46 


24.7 



-fO 
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